This vignette showed the applications of TF activity that estimated from single-cell RNA sequencing data (Smart-seq2) during mouse HSC formation (Zhou et.al. 2018 & Wang et.al. 2022).
The single-cell RNA-sequencing data including 6 sequential cell populations during HSC development (aortic endothelial cells (AECs) and HSC-primed hemogenic-endothelial cells (HECs) from E10 aorta-gonad-mesonephros (AGM) region, T1 and T2 pre-HSCs from E11 AGM, E12 & E14 fetal liver(FL) HSCs, and HSCs from adult bone marrow).
The TF activity estimation followed with the workflow at metaRegulon vignette. Here, we don’t repeat the GRN inference and TF activity estimation steps, but showing the following steps such as cell-type-specificity analysis.
The raw expression matrix and TF activity data can be downloaded from Zenodo.
suppressMessages(suppressWarnings(library(PUIC)))
suppressMessages(suppressWarnings(library(scATFR)))
suppressMessages(suppressWarnings(library(metaRegulon)))
atfr <- readRDS("D:/HSC_Autophagy/Markdown/5.metaRegulon/metaRegulon/inst/extdata/HSC_development/atfr.rds")
Next, we can extract the TF activity object act_sce that deposit in alternative assays from the atfr object for the following analysis.
act_sce <- altExp(x = atfr, e = "viper")
class(act_sce)
## [1] "SingleCellExperiment"
## attr(,"package")
## [1] "SingleCellExperiment"
Now, lets take a look at the values in the act_sce object:
assay(act_sce)[1:5,1:5]
## E10.0_AEC_1_1 E10.0_AEC_1_2 E10.0_AEC_1_3 E10.0_AEC_1_4
## 2700081O15Rik 2.6691509 3.4384611 3.519365 2.8035412
## Adnp 0.9074099 1.5317953 1.168628 0.5021225
## Adnp2 0.3717352 0.0147019 1.855357 2.8388917
## Aebp1 -0.4412220 -1.2217474 -0.107998 -0.6754781
## Aebp2 9.1200368 10.7359366 10.183974 10.5986185
## E10.0_AEC_1_5
## 2700081O15Rik 3.189519
## Adnp 1.525927
## Adnp2 2.124061
## Aebp1 -1.506020
## Aebp2 9.507078
We can see that the values in act_sce have negative
values that may affect the following analysis steps, here we performed a
liner normalization step to scales the values into 0 to 10.
assay(act_sce) <- t(apply(assay(act_sce),1,function(x){10*(x-min(x))/(max(x)-min(x))}))
assay(act_sce)[1:5,1:5]
## E10.0_AEC_1_1 E10.0_AEC_1_2 E10.0_AEC_1_3 E10.0_AEC_1_4
## 2700081O15Rik 3.256182 4.5867276 4.726653 3.488614
## Adnp 5.732479 7.3491813 6.408844 4.683081
## Adnp2 1.642584 0.7662129 5.284269 7.698447
## Aebp1 8.256700 6.4896524 9.011094 7.726363
## Aebp2 4.952973 6.2213735 5.788111 6.113586
## E10.0_AEC_1_5
## 2700081O15Rik 4.156174
## Adnp 7.333986
## Adnp2 5.943830
## Aebp1 5.846082
## Aebp2 5.256781
Then, we can also generate another object exp_sce including the expression of TFs obtained form atfr object.
exp_sce <- atfr[row.names(act_sce),]
exp_sce
## class: SingleCellExperiment
## dim: 831 154
## metadata(2): scATFR atfr_version
## assays(2): counts logcounts
## rownames(831): 2700081O15Rik Adnp ... Zxdc Zzz3
## rowData names(0):
## colnames(154): E10.0_AEC_1_1 E10.0_AEC_1_2 ... Adult_HSC_3739
## Adult_HSC_3740
## colData names(3): sample stage sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(1): viper
Here we compared the performance of TF expression and TF activity on dimensional reduction analysis
library(scater)
exp_sce <- runPCA(x=exp_sce, exprs_values="logcounts")
exp_sce <- runTSNE(x = exp_sce, exprs_values="logcounts")
plotReducedDim(exp_sce, dimred="TSNE",colour_by = "stage")
We noticed that the order of stages is according to alphabet, but
this is not our desire order. To custom the order of the stages, we can
use reLevel() function:
exp_sce <- reLevel(x = exp_sce, colable="stage", level_order=c("AEC", "HEC", "T1_pre_HSC", "T2_pre_HSC", "FL_HSC", "Adult_HSC"))
p <- plotReducedDim(exp_sce, dimred="TSNE",colour_by = "stage")
p
The parameter colable indicate the column label used for order adjustment, while the level_order indicate the updated order.
Next, we use TF activity to perform dimensional reduction analysis
act_sce <- reLevel(x = act_sce, colable="stage", level_order=c("AEC", "HEC", "T1_pre_HSC", "T2_pre_HSC", "FL_HSC", "Adult_HSC"))
act_sce <- runPCA(x=act_sce, exprs_values="atfr")
act_sce <- runTSNE(x = act_sce, exprs_values="atfr")
q <- plotReducedDim(act_sce, dimred="TSNE",colour_by = "stage")
q
To compare the dimensional reduction between TF expression and TF activity and save the results, one can use the following codes:
ggplot2::ggsave(filename ="tf_expression_vs_activity_tSNE_res.pdf" ,plot = p+q, device = "pdf",
path = "./", width = 150, height =50, dpi = 600, limitsize = FALSE, units ="mm")
Here we introduce the demonstration of TFs at expression or activity
level with tSNE plots. Since the dimensional reduction results of TF
expression and TF activity is quit different, we use the tSNE result of
TF activity as the coordinate. Firstly, we add logcounts matrix from exp_sce into object act_sce:
assay(act_sce,"logcounts") <- assay(exp_sce,"logcounts")[row.names(act_sce),]
Then, we can compare the TF expression and TF activity of the key TFs as follow:
i <- "Hes1"
p1 <- plotReducedDim(object = act_sce,dimred = "TSNE", colour_by = i, by_exprs_values = "logcounts", point_size=0.5, point_alpha=1)
p2 <- plotReducedDim(object = act_sce,dimred = "TSNE", colour_by = i, by_exprs_values = "atfr", point_size=0.5, point_alpha=1)
p1+p2
In this section, we are going to give some examples of differential analysis at TF expression and TF activity level. The TF activity can perform nearly all the same analysis as the expression level did.
Here we performed the differential analysis between AECs and HECs at TF activity level using R package scran:
library(scran)
AEC_HEC_diff_act <- findMarkers(x = act_sce, groups=colData(act_sce)$stage, restrict=c("AEC","HEC"))
AEC_HEC_diff_act
## List of length 2
## names(2): AEC HEC
The results of differential activity analysis is a list of Data Frames.
HEC_pos_diff_act <- as.data.frame(AEC_HEC_diff_act$HEC)
HEC_pos_diff_act$Direction <- ifelse(HEC_pos_diff_act$FDR<0.05,ifelse(HEC_pos_diff_act$summary.logFC>0,'up','down'),'not_sig')
HEC_pos_diff_act$Direction <- factor(x = HEC_pos_diff_act$Direction,level=c('up','down','not_sig'))
head(HEC_pos_diff_act)
Top <int> | p.value <dbl> | FDR <dbl> | summary.logFC <dbl> | logFC.AEC <dbl> | Direction <fct> | |
|---|---|---|---|---|---|---|
| Nfe2 | 1 | 2.480302e-07 | 8.557341e-05 | 5.130534 | 5.130534 | up |
| Runx1 | 2 | 2.547060e-07 | 8.557341e-05 | 5.987795 | 5.987795 | up |
| Elf1 | 3 | 3.363936e-07 | 8.557341e-05 | 4.655217 | 4.655217 | up |
| Bcl11a | 4 | 4.119057e-07 | 8.557341e-05 | 5.082331 | 5.082331 | up |
| Myb | 5 | 8.725999e-07 | 1.329558e-04 | 5.044488 | 5.044488 | up |
| Hif1a | 6 | 9.599699e-07 | 1.329558e-04 | 3.950377 | 3.950377 | up |
Now, we can use volcano plot to visualize the results as follow:
library(ggplot2)
vol <- ggplot(data = HEC_pos_diff_act, mapping = aes(x = summary.logFC, y=-log10(p.value), color=Direction))+
geom_point(shape=16)+
scale_color_manual(values = c("red","blue","gray"))+
theme_bw()
vol
Here we use findAllCTSRegulons() to calculate the cell-type-specific TFs for all 6 stages at regulon level and expression level.
cts_act <- findAllCTSRegulons(x = act_sce, clust_name = 'stage', use_altExp = FALSE, use_assay = "atfr", nperm = 10000, ncores=4)
cts_exp <- findAllCTSRegulons(x = exp_sce, clust_name = 'stage', use_altExp = FALSE, use_assay = "logcounts",nperm = 10000, ncores=4)
Please note that if you want to calculate the cell-type-specific TFs for only few stages, you can use findCellTypeSignature() to calculate them. For example, To calculate the TFs that specifically activated in AEC:
AEC_cts_act <- findCellTypeSignature(x = act_sce, pos_clust="AEC", clust_name = 'stage',use_altExp = FALSE,nperm=10000,ncores=4)
Next, we want to identify TFs that only showed cell-type-specificity at activity level but not expression level:
library(purrr)
library(dplyr)
cts_act_list <- split(cts_act,cts_act$stage)
cts_exp_list <- split(cts_exp,cts_exp$stage)
cts_act_only <- purrr::map2(.x = cts_act_list, .y = cts_exp_list,.f = function(x,y){
x %>% filter(!(regulon %in% y$regulon))
})
cts_act_only <- do.call(rbind,cts_act_only)
head(cts_act_only)
stage <chr> | regulon <chr> | pos_score <dbl> | neg_score <dbl> | ave_diff <dbl> | CSS <dbl> | pval <dbl> | padj <dbl> |
|---|---|---|---|---|---|---|---|
| Adult_HSC | Cebpb | 7.507839 | 2.997325 | 4.510514 | 0.4615632 | 0 | 0 |
| Adult_HSC | Pgr | 7.613186 | 3.071952 | 4.541234 | 0.4598346 | 0 | 0 |
| Adult_HSC | Zfp382 | 7.822606 | 3.182685 | 4.639921 | 0.4591472 | 0 | 0 |
| Adult_HSC | Hdx | 5.124096 | 2.075604 | 3.048492 | 0.4574042 | 0 | 0 |
| Adult_HSC | Snai1 | 6.884967 | 2.931069 | 3.953898 | 0.4519103 | 0 | 0 |
| Adult_HSC | Mlxipl | 8.028379 | 3.452268 | 4.576110 | 0.4506860 | 0 | 0 |
To explore the distribution these cell-type-specific TFs, we utilize
the RadViz figure to visualize the distribution of these signature
genes. Firstly, we calculate the average activity of each TF in every
stage using function summarizeClusterValue().
mean_act_score <- summarizeClusterValue(object = act_sce, clust_name = "stage", gene_name = unique(cts_act_only$regulon), use_assay = "atfr")
mean_act_score <- data.frame(gene_name = row.names(mean_act_score), mean_act_score)
head(mean_act_score)
gene_name <chr> | Adult_HSC <dbl> | AEC <dbl> | FL_HSC <dbl> | HEC <dbl> | T1_pre_HSC <dbl> | T2_pre_HSC <dbl> | |
|---|---|---|---|---|---|---|---|
| Cebpb | Cebpb | 7.507839 | 3.388991 | 3.575933 | 3.367401 | 1.392066 | 3.228716 |
| Pgr | Pgr | 7.613186 | 4.464197 | 3.601594 | 3.114728 | 1.437119 | 2.904680 |
| Zfp382 | Zfp382 | 7.822606 | 4.167843 | 4.057007 | 2.714360 | 1.788604 | 3.046008 |
| Hdx | Hdx | 5.124096 | 1.779172 | 2.905642 | 1.240981 | 1.721782 | 2.278481 |
| Snai1 | Snai1 | 6.884967 | 4.609386 | 3.238935 | 3.205787 | 1.655094 | 2.140615 |
| Mlxipl | Mlxipl | 8.028379 | 3.238340 | 3.895441 | 3.508666 | 2.681085 | 3.775979 |
Then, we use the hexagonal density plot to visualize the distribution of cell-type-specific TFs as follow:
cts_act_frm <- left_join(x = cts_act_only, y = mean_act_score,by=c("regulon"="gene_name"))
stages <- c("AEC", "HEC", "T1_pre_HSC", "T2_pre_HSC","FL_HSC","Adult_HSC")
cts_act_frm$stage <- factor(cts_act_frm$stage,levels=stages)
rv1 <- RadvizEnrichPlot(x = cts_act_frm, color_by='stage', anchors = stages, optim_anchor = F, recenter = stages, text_size = 4, plot_type = "hexagonal")
rv1
From the above result, we can see that most of the cell-type-specific TFs only at activity level fall into fetal liver HSCs and adult bone marrow HSCs. Next, to further explore the disribution of these cell-type-specific TFs, we colored the points with their cell-type-specificity label.
rv2 <- RadvizEnrichPlot(x = cts_act_frm, color_by='stage', anchors = stages, optim_anchor = F, recenter = stages, text_size = 4)
rv2
In the above figure, we found that the cell-type-specificity label
showed relative high correlation with their location. Note that the rv generate from RadvizEnrichPlot() function is a ggplot object, which means that you can modify this object with functions in ggplot2 package (e.g. adding gene labels).
candidates <- c("Sox6","Ddit3","Cebpb")
names(candidates) <- candidates
cts_act_frm$gene_label <- candidates[cts_act_frm$regulon]
rv2 <- RadvizEnrichPlot(x = cts_act_frm, color_by='stage', anchors = stages, optim_anchor = F, recenter = stages, text_size = 4)+
geom_text(mapping = aes(label=gene_label),na.rm = TRUE)
rv2
Besides, we noticed that some of the above TFs showed cell-type-specificity in more than one cell types. To better describe cell-type-specific distribution among these cell types, we can also use Venn diagram to visualize these TFs:
library(UpSetR)
##
## 载入程辑包:'UpSetR'
## The following object is masked from 'package:lattice':
##
## histogram
cts_act_only_stat <- table(cts_act_only$regulon,cts_act_only$stage)
attributes(cts_act_only_stat)$class <- "matrix"
data_set <- as.data.frame(cts_act_only_stat)
data_set <- data_set[,stages]
setR <- upset(data_set, nsets = 6, nintersects = 40, mb.ratio = c(0.5, 0.5),
order.by = c("degree","freq"), decreasing = c(F,TRUE), point.size=5,
line.size=2)
setR
Since a TF regulon includes a group of targets, which may participate some pathways. We use Jaccard test to estimate the similarity between regulons and pathways as follow:
data(Reactome_pathway)
mm_pathway <- Reactome_pathway$`Mus musculus`
cts_act_HEC <- cts_act_frm %>% filter(stage %in% "HEC")
cts_act_HEC_regulon <- as.list(Regulon(atfr)[cts_act_HEC$regulon])
enrich_res <- calcuModuleSimilar(tfr_list = cts_act_HEC_regulon, pathway_list = mm_pathway,fdr = 0.01, min_overlap = 10, ncores = 4)
head(enrich_res)
regulon <chr> | pathway <chr> | similarity <dbl> | pval <dbl> | padj <dbl> | |
|---|---|---|---|---|---|
| Zfp322a | Metabolism | 0.01690270 | 1.796942e-05 | 7.773865e-03 | |
| Srebf1 | Metabolism of lipids | 0.02275960 | 2.895714e-05 | 5.424638e-03 | |
| Ncoa2 | Metabolism of RNA | 0.02853261 | 3.619974e-05 | 2.034425e-03 | |
| Ncoa2 | Translation | 0.02863962 | 6.451759e-06 | 4.532361e-04 | |
| Fli1 | Fc epsilon receptor (FCERI) signaling | 0.02222222 | 5.035663e-05 | 5.660085e-03 | |
| Fli1 | Metabolism of RNA | 0.04581006 | 2.183843e-10 | 2.454639e-07 |
Since one regulon may enriched with multiple pathways, we merge them by the number of shared genes among every two pathways and set the pathway term with smallest p-value as the representative term. Meanwhile, we can use Sankey diagram to visualize the relationships between TFs and enriched pathways.
library(easyalluvial)
library(parcats)
enrich_res <- enrich_res %>% filter(firstInGroup == 1)
p = alluvial_wide(enrich_res, max_variables = 2, fill_by = "first_variable",
colorful_fill_variable_stratum=F,col_vector_value=NA)
parcats(p, marginal_histograms = FALSE, data_input = data, height=500,width = 500)
We use the TF symbols to represent the regulons and we can easily visualize the pathways of the regulons.
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936
## [2] LC_CTYPE=Chinese (Simplified)_China.936
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_China.936
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] parcats_0.0.2 easyalluvial_0.3.0
## [3] UpSetR_1.4.0 scran_1.20.1
## [5] metaRegulon_0.1.0 scATFR_0.1.9
## [7] stringr_1.4.0 progress_1.2.2
## [9] kSamples_1.2-9 SuppDists_1.1-9.7
## [11] glmnet_4.1-3 Matrix_1.3-4
## [13] factoextra_1.0.7 cvTools_0.3.2
## [15] robustbase_0.93-9 lattice_0.20-44
## [17] Radviz_0.9.2 ROCR_1.0-11
## [19] AUCell_1.14.0 scater_1.20.1
## [21] ggplot2_3.3.5 scuttle_1.2.0
## [23] jaccard_0.1.0 qvalue_2.24.0
## [25] dplyr_1.0.8 ppcor_1.1
## [27] MASS_7.3-54 scLink_1.0.1
## [29] glasso_1.11 RcisTarget_1.12.0
## [31] fgsea_1.18.0 SeuratObject_4.0.4
## [33] Seurat_4.1.0 GENIE3_1.14.0
## [35] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
## [37] Biobase_2.52.0 GenomicRanges_1.44.0
## [39] GenomeInfoDb_1.28.1 IRanges_2.26.0
## [41] S4Vectors_0.30.0 BiocGenerics_0.38.0
## [43] MatrixGenerics_1.4.0 matrixStats_0.59.0
## [45] PUIC_0.1.0 reshape2_1.4.4
## [47] minet_3.50.0 pbapply_1.5-0
## [49] purrr_0.3.4
##
## loaded via a namespace (and not attached):
## [1] scattermore_0.8 R.methodsS3_1.8.1
## [3] tidyr_1.2.0 bit64_4.0.5
## [5] knitr_1.33 irlba_2.3.3
## [7] DelayedArray_0.18.0 R.utils_2.10.1
## [9] data.table_1.14.0 rpart_4.1-15
## [11] KEGGREST_1.32.0 RCurl_1.98-1.3
## [13] generics_0.1.2 ScaledMatrix_1.0.0
## [15] cowplot_1.1.1 RSQLite_2.2.7
## [17] RANN_2.6.1 future_1.24.0
## [19] bit_4.0.4 lubridate_1.7.10
## [21] spatstat.data_2.1-2 httpuv_1.6.3
## [23] assertthat_0.2.1 viridis_0.6.2
## [25] gower_0.2.2 xfun_0.24
## [27] hms_1.1.1 jquerylib_0.1.4
## [29] evaluate_0.15 promises_1.2.0.1
## [31] DEoptimR_1.0-10 fansi_0.5.0
## [33] igraph_1.2.6 DBI_1.1.1
## [35] htmlwidgets_1.5.4 spatstat.geom_2.3-2
## [37] ellipsis_0.3.2 annotate_1.70.0
## [39] deldir_1.0-6 sparseMatrixStats_1.4.0
## [41] vctrs_0.3.8 ggalluvial_0.12.3
## [43] abind_1.4-5 cachem_1.0.5
## [45] withr_2.5.0 progressr_0.8.0
## [47] sctransform_0.3.3 prettyunits_1.1.1
## [49] goftest_1.2-3 cluster_2.1.2
## [51] lazyeval_0.2.2 crayon_1.5.1
## [53] arrow_4.0.1 edgeR_3.34.0
## [55] recipes_0.1.16 pkgconfig_2.0.3
## [57] labeling_0.4.2 nlme_3.1-152
## [59] vipor_0.4.5 nnet_7.3-16
## [61] rlang_1.0.2 globals_0.14.0
## [63] lifecycle_1.0.1 miniUI_0.1.1.1
## [65] rsvd_1.0.5 polyclip_1.10-0
## [67] lmtest_0.9-38 graph_1.70.0
## [69] zoo_1.8-9 beeswarm_0.4.0
## [71] ggridges_0.5.3 png_0.1-7
## [73] viridisLite_0.4.0 bitops_1.0-7
## [75] R.oo_1.24.0 KernSmooth_2.23-20
## [77] Biostrings_2.60.1 blob_1.2.1
## [79] DelayedMatrixStats_1.14.0 shape_1.4.6
## [81] parallelly_1.30.0 spatstat.random_2.1-0
## [83] beachmat_2.8.0 scales_1.1.1
## [85] memoise_2.0.0 GSEABase_1.54.0
## [87] magrittr_2.0.1 plyr_1.8.6
## [89] hexbin_1.28.2 ica_1.0-2
## [91] zlibbioc_1.38.0 compiler_4.1.1
## [93] dqrng_0.3.0 RColorBrewer_1.1-2
## [95] fitdistrplus_1.1-8 cli_3.0.0
## [97] XVector_0.32.0 listenv_0.8.0
## [99] patchwork_1.1.1 mgcv_1.8-36
## [101] tidyselect_1.1.2 stringi_1.6.2
## [103] forcats_0.5.1 highr_0.9
## [105] yaml_2.2.1 BiocSingular_1.8.1
## [107] locfit_1.5-9.4 ggrepel_0.9.1
## [109] grid_4.1.1 sass_0.4.1
## [111] fastmatch_1.1-0 tools_4.1.1
## [113] future.apply_1.8.1 rstudioapi_0.13
## [115] bluster_1.2.1 foreach_1.5.2
## [117] metapod_1.0.0 gridExtra_2.3
## [119] prodlim_2019.11.13 farver_2.1.0
## [121] Rtsne_0.15 digest_0.6.27
## [123] lava_1.6.10 shiny_1.7.1
## [125] Rcpp_1.0.7 later_1.3.0
## [127] RcppAnnoy_0.0.19 httr_1.4.2
## [129] AnnotationDbi_1.54.1 colorspace_2.0-2
## [131] XML_3.99-0.6 tensor_1.5
## [133] reticulate_1.20 splines_4.1.1
## [135] uwot_0.1.11 statmod_1.4.36
## [137] spatstat.utils_2.3-0 plotly_4.10.0
## [139] xtable_1.8-4 jsonlite_1.7.2
## [141] timeDate_3043.102 ipred_0.9-11
## [143] R6_2.5.1 pillar_1.7.0
## [145] htmltools_0.5.2 mime_0.12
## [147] glue_1.4.2 fastmap_1.1.0
## [149] BiocParallel_1.26.1 BiocNeighbors_1.10.0
## [151] class_7.3-19 codetools_0.2-18
## [153] utf8_1.2.1 bslib_0.3.1
## [155] spatstat.sparse_2.1-0 tibble_3.1.2
## [157] ggbeeswarm_0.6.0 leiden_0.3.9
## [159] survival_3.2-11 limma_3.48.1
## [161] rmarkdown_2.9 munsell_0.5.0
## [163] GenomeInfoDbData_1.2.6 iterators_1.0.14
## [165] gtable_0.3.0 spatstat.core_2.4-0